A Survey of Automatic Indexing Techniques for Thai Text Documents

نویسنده

  • Todsanai Chumwatana
چکیده

* Faculty of Information Technology, Rangsit University. Abstract With the rapidly increasing number of Thai text documents available in digital media and websites, it is important to find an efficient text indexing technique to facilitate search and retrieval. An efficient index would speed up the response time and improve the accessibility of the documents. Up to now, not much research in Thai text indexing has been conducted as compared to more commonly used languages like English or other European languages. In Thai text indexing, the extraction of indexing terms becomes a main issue because they cannot be specified automatically from text documents, due to the nature of Thai texts being non-segmented. As a result, there are many challenges for indexing Thai text documents. The ma-jority of Thai text indexing techniques can be divided into two main categories: a language-dependent technique and a lan-guage-independent technique as will be described in this paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

An Enhancement of Thai Text Retrieval Efficiency by Automatic Backward Transliteration

Loan words, which are borrowed from foreign languages, are used in many languages such as Japanese, Chinese, Korean and Thai. They have effects on Thai Text Retrieval (TTR) system leading to inaccurate terms weight for indexing and text clustering. Therefore, there is a need to create automatic backward transliteration that can solve this problem. In this paper, we propose a hybrid model approa...

متن کامل

مدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی

Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing.   This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to c...

متن کامل

A Survey of Indexing and Retrieval of Multimodal Documents: Text and Images

A document conveys information using multiple modalities, including text, layout/style and images. For example, journal articles usually have figures to illustrate experimental results, and the title in a journal article usually has a different font size than the body text. Indexing and retrieval using only text is the traditional way of IR (Information Retrieval). With the development of the I...

متن کامل

Segmentation of Thai Handwritten Text for Automatic Document Retrieval

There is a huge amount of documents in Thai government organizations. Although automatic document image retrieval systems in English have been proposed and developed, there are no specific system which is capable to retrieve relevant information from documents in Thai language. While matching words or optical character recognition (OCR) can be applied, segmentation of the words and characters i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013